GYM: A Multiround Join Algorithm In MapReduce And Its Analysis
نویسندگان
چکیده
We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d+log(n)) rounds andO(n (IN +OUT) M ) communication cost, where M is the memory available per machine in the cluster and IN and OUT are the sizes of input and output of the query, respectively. M is assumed to be IN 1 , for some constant > 1. Using GYM we achieve two main results: (1) Every width-w query can be computed in O(n) rounds of MapReduce with O(n (IN +OUT)
منابع مشابه
GYM: A Multiround Join Algorithm In MapReduce
Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for a spectrum of rounds for the problem of computing the equijoin of n relations. Specifically, given any query Q with width w, intersection width iw, input size IN, output size OUT, ...
متن کاملGYM: A Multiround Distributed Join Algorithm
Multiround algorithms are now commonly used in distributed data processing systems, yet the extent to which algorithms can benefit from running more rounds is not well understood. This paper answers this question for several rounds for the problem of computing the equijoin of n relations. Given any query Q with width w, intersection width iw, input size IN, output size OUT, and a cluster of mac...
متن کاملCost Based Multi-Way Equi-Join Optimization in MapReduce
MapReduce is a prominent programming model above shared nothing architecture for processing big data with a parallel, distributed algorithm on a cluster. Join is an important operation is very inefficient in MapReduce. In this work, a time cost based evolution model is proposed for multi-way join by considering the time cost calculation. A multi-way join consists of start pattern joins and chai...
متن کاملHow Reduce Side Join Part File Expressions Equal MapReduce Structure into Task Consequences, Performance?
An intention of MapReduce Sets for Reduce side join part file expressions analysis has to suggest criteria how Reduce side join part file expressions in Reduce side join part file data can be defined in a meaningful way and how they should be compared. Similitude based MapReduce Sets for Reduce side join part file Expression Analysis and MapReduce Sets for Assignment is expected to adhere to fu...
متن کاملGeneralized Parallel Join Algorithms and Designing Cost Models
Applications for large-scale data analysis use such techniques as parallel DBMS, MapReduce (MR) paradigm, and columnar storage. In this paper we focus in a MapReduce environment. The aim of this work is to compare the different join algorithms and designing cost models for further use in the query optimizer.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015